104

Binary Neural Architecture Search

where λ is a hyperparameter to balance the two terms. Hl

c is the cth full-precision filter of

the lth convolutional layer and ˆHl

c denotes its corresponding reconstructed filter; MSE(·)

represents the mean square error (MSE) loss. The second term minimizes the intraclass

compactness since the binarization process causes feature variations. fC,s( ˆH) denotes the

feature map of the last convolutional layer for the sth sample, and f C,s( ˆH) denotes the

class-specific mean feature map for the corresponding samples. Combining L ˆ

H with the

conventional loss LCE, we obtain the final loss:

L = LCE + L ˆ

H.

(4.18)

The L and its derivatives are easily calculated directly using the efficient automatic

derivatives package.

4.3.5

Ablation Study

We tested different βP for our method on the CIFAR-10 dataset, as shown on the right side

of Fig. 4.9. We can see that when βP increases, the precision increases at first but decreases

when βP2. It validates that the performance loss between the Child and Parent models

is a significant measure for the 1-bit CNNs search. When βP increases, CP-NAS tends to

select the architecture with fewer convolutional operations, and the imbalance between two

elements in our CP model leads to a performance drop.

We also compare the architectures obtained by CP-NAS, Random, PC (PC-DARTs),

and BNAS as shown in Fig. 4.9. Unlike the case of the full-precision model, Random

and PC-DARTs lack the necessary guidance, which has poor performance for binarized

architecture search. Both BNAS and CP-NAS have the evaluation indicator for operation

selection. Differently, our CP-NAS also uses performance loss, which can outperform the

other three strategies.

Efficiency. As shown in XNOR, the 1-bit CNNs are very efficient and promising for

resource-limited devices. Our CP-NAS achieves a performance comparable to that of the

full precision hand-crafted model with up to an estimated 11 times memory saving and 58

times speed up, which is worth further research and will benefit extensive edge computing

applications.

0

1

2

3

4

5

βP

90

91

92

93

94

Accuracy (%)

Random

PC

BNAS

CP-NAS

Search strategy

89

90

91

92

93

94

Accuracy (%)

FIGURE 4.9

The result (right) for different βP on CIFAR-10. The 1-bit CNNs result (left) for different

search strategies on CIFAR-10, including random search, PC (PC-DARTs), BNAS, CP-

NAS. We approximately implement BNAS by setting βP as 0 in CP-NAS, which means

that we only use the performance measure for the operation selection.